Original Paper
Abstract
Background: Artificial intelligence (AI) is increasingly proposed for use in health and health care systems. Beyond technical performance, public perceptions and affective responses influence whether AI technologies are accepted and adopted in real-world contexts. Social media platforms such as X (formerly Twitter) provide large-scale, real-time insight into public discourse surrounding emerging technologies, yet remain underused for examining how health AI is discussed, evaluated, and emotionally framed.
Objective: This study aimed to develop and apply large language model (LLM)–based methods for exploratory social listening on health AI. This is the first study to map large-scale sentiment, emotional expressions, and confidence-related signals in online discussions of applications of AI to health.
Methods: We collected 786,750 English-language posts from X (Twitter) published between January 1 and December 5, 2023, using health- and AI-related keywords. We benchmarked an LLM-based annotation framework by using OpenAI’s GPT-3.5-Turbo and GPT-4, comparing model classifications with trained human researchers. Annotations included overall sentiment and 6 evaluative domains frequently referenced in the literature surrounding attitudes toward health AI—usefulness, safety, privacy, ethics, quality, and trust. After cleaning, GPT-3.5-Turbo used the best-performing prompts to label 388,009 posts. A subset (n=268,347) was further analyzed using Emollama-7b, an open-source model fine-tuned from Meta’s LLaMA2-7B, for emotion detection, and latent Dirichlet allocation for thematic analysis. Comparisons were made across World Health Organization regions.
Results: Compared against human annotations, optimized prompts achieved weighted F1-scores above 0.60 across evaluative domains and sentiment classification. Global discourse about health AI was 65.26% (95% CI 65.11%-65.4%) positive and 83.62% (95% CI 83.48%-83.76%) emotionally optimistic, although substantial regional variation was observed in sentiment (P<.001). The Eastern Mediterranean and South-East Asia regions expressed significantly higher levels of positive sentiment and evaluative agreement in the studied features of health AI, alongside frequent discussion of the tech industry and commercial development. In comparison, the Western Pacific region expressed lower confidence and significantly more mentions of research topics (19.27%, 95% CI 18.5%-20.07%). Privacy was the most prominent global concern, with 33.31% (95% CI 32.98%-33.66%) of privacy-related posts expressing perceived risks. In the Region of the Americas, 18.19% (95% CI 17.92%-18.44%) of posts discussed algorithms and data governance, significantly higher than overall.
Conclusions: This study offers the first systematic characterization of online health AI discourse at scale, mapping stances toward key features of AI, emotional tone, and discussion topics across regions. LLM-powered social listening is demonstrated as a feasible approach for identifying dominant narratives and regionally distinct concerns, capable of surfacing opinions absent from traditional media. This can extend to studying discourse on other evolving health technologies where public surveying is limited. While methodological refinement and multilingual expansion are needed, this framework can inform timely policy development, risk communication, and responsible health AI governance.
doi:10.2196/80346
Keywords
Introduction
The release of large language model (LLM) chatbots like ChatGPT (OpenAI) in 2022 marked a turning point in the accessibility of artificial intelligence (AI), sparking public and research interest alike. By 2023, the rapid progress of LLMs extended to health care applications, leveraging the explosion of online data for training sophisticated AI models []. Although thousands of studies on AI in health care are published annually [], most remain at the proof-of-concept or pilot stage. Progress toward routine clinical deployment has been comparatively slow, reflecting unresolved challenges in validation, generalizability, regulatory approval, workflow integration, and safe implementation in high-stakes clinical environments [,].
As with vaccination and other system-level medical interventions, the impact of health AI depends not only on technical performance but on the stability of the social systems that maintain its use. In this sense, public and professional confidence functions as a form of infrastructure—less visible than algorithms or regulatory approvals, yet equally foundational to safe and scalable implementation. Without durable trust in safety, fairness, accountability, and data governance, even high-performing AI tools may fail to achieve uptake.
Although AI’s empirical capabilities are advancing rapidly, its translation into routine care is mediated by perception, politics, and institutional legitimacy [,]. Evidence shows that clinicians and the public hold complex, sometimes ambivalent views of health AI, shaped by concerns about safety, bias, accountability, ethics, and data use [-]. In high-stakes clinical settings, where consequences are immediate and personal, perceived risks can outweigh demonstrated benefits. Recognizing confidence as infrastructure implies that it must be actively maintained. Scholars have therefore called for continuous, scalable systems to monitor public and professional confidence in health AI over time—analogous to vaccine-confidence surveillance—so that governance, communication, and implementation strategies can adapt to emerging risks and concerns [].
One promising source of data for examining evolving perceptions of health AI is online platforms, where AI-related discussions surged following the release of ChatGPT in late 2022 []. Social media has surpassed traditional media as a major source of information, exerting considerable influence on public opinion formation and politics []. Platforms provide sites for exchange and exposure to new ideas in a public forum, yet not all ideas are amplified equally. Visibility on public forums is shaped by engagement-based algorithms, which can tend to reward emotionally charged content.
A growing body of research indicates that content that invokes negative sentiments or high-arousal emotions, such as anger and sadness, is more likely to be perceived as believable and to spread widely [,]. This emotional content has boosted the dissemination of misinformation and fake news [,]. Because public awareness of health AI is generally low, even among some medical professionals [,,], information consumers might rely on their emotional responses over reasoning when consuming health AI-related media, a choice associated with greater susceptibility to misinformation [].
Measuring attitudes toward AI is itself a conceptual challenge. Unlike many previous technological innovations, AI systems exhibit anthropomorphic traits, broad applications, and inaccessible “black box” decision-making processes [,] that users struggle to conceptualize. Consequently, theoretical approaches to understanding confidence in AI span multiple domains, including technical performance, ethical governance, legal accountability, and societal alignment, without a clear consensus on priorities [,]. Support for one attribute of a health AI system, such as predictive accuracy, does not necessarily imply endorsement of others, such as ethical data use or privacy protections []. Therefore, a large-scale investigation into the cues in health AI confidence and emotion shared in online discourse is needed to better understand the information environment shaping public perceptions and governance challenges.
Achieving such large-scale insight presents additional methodological obstacles. Previous studies examining social media discourse on health AI have used manual content analysis, restricting analyses to a few thousand posts [,]. To address scalability, machine learning approaches like support vector machines have also been explored, but these models often exhibit poor generalizability, particularly when applied to health-related social media data [,]. Common natural language processing (NLP) techniques struggle with domain-specific language, sarcasm, and context-dependent framing, and require large, manually annotated datasets and extensive tuning to achieve acceptable performance [,]. These constraints in methodology and performance limit the feasibility of timely and large-scale assessments of public opinion.
LLMs have been proposed as a more adaptive and scalable alternative. By leveraging transformer-based architectures, LLMs can capture complex linguistic nuances and adapt to different contexts, while reducing reliance on task-specific training data []. Existing research has found LLMs to outperform traditional methods with significantly less reliance on manually annotated data, further reducing the time and resource burden of traditional approaches [,]. However, empirical applications of LLMs to the analysis of public discourse on health AI remain limited, while reported performance on health-related social media data has been inconsistent [,].
In response to these gaps, this study has 2 primary objectives. First, we evaluate the performance of 2 widely used GPT-based LLMs for annotating short, health AI–related social media posts, using zero-shot, few-shot, and chain-of-thought prompting. This analysis offers a reference point that can be revisited as language models and prompting strategies continue to evolve. Second, drawing on the best-performing annotation approach, we conduct a large-scale, descriptive social listening analysis to characterize how health AI is discussed online. Specifically, we examine the distribution of sentiments, discrete emotions, and thematic content expressed in posts related to health AI, and assess variation across geographic sources of posts. Together, these contributions demonstrate the feasibility and limitations of using LLMs for scalable analysis of public discourse on a rapidly evolving topic.
Methods
Overview
A multipronged approach of LLM and NLP tools was used to assess the global sample of X (formerly Twitter) discussions regarding health AI. Post sentiment and positions on health AI confidence were analyzed using an LLM-prompting strategy. Performance was measured using a benchmark dataset and a prompt testing pipeline. Previously validated LLM and NLP methods were applied for emotion detection and discussion themes [,].
Sentiment Analysis Framework
We grounded our exploratory social-listening approach to health AI in methodological advances from vaccine-confidence surveillance, adapting validated, double-coded annotation frameworks that have been applied to large-scale social media data over the past decade. This approach leverages established protocols for construct definition, coder adjudication, and reliability benchmarking to enable systematic monitoring of evolving public attitudes [-]. These studies have adapted the World Health Organization’s (WHO) Confidence, Complacency, and Convenience (“3 Cs”) model of vaccine hesitancy [] to translate a complex decision-making model into analyzable indicators of online discourse (eg, expressions that vaccines are not safe). With this precedent, we reviewed existing studies and the models used to measure attitudes toward AI, such as the prominent technology acceptance model [] and its extensions. From the literature, we identified 6 features of health AI recurrently discussed by experts and the public (). Based on their frequency in the literature, we expected these dimensions to be similarly reflected in an exploratory, high-level analysis of how health AI is discussed online.
| AIa confidence | Definition | GPT-3.5-Turbo performance, weighted F1-score (95% CI) | GPT-4 performance, weighted F1-score (95% CI) |
| Sentiment | The post is positive, negative, or neutral toward AI | 0.71 (0.62-0.79) | 0.74 (0.67-0.81) |
| Safety [,,] | This post indicates AI is safe | 0.64 (0.56-0.72) | 0.79 (0.72-0.86) |
| Usefulness [,,-] | This post indicates AI is useful | 0.69 (0.60-0.76) | 0.77 (0.69-0.84) |
| Trustworthiness [,,] | This post indicates AI is trustworthy | 0.61 (0.52-0.70) | 0.63 (0.56-0.71) |
| Privacy [-,,,,] | This post indicates AI respects privacy | 0.66 (0.57-0.74) | 0.76 (0.68-0.82) |
| Ethics [,,-] | This post indicates AI is ethical | 0.63 (0.55-0.71) | 0.71 (0.62-0.78) |
| Quality [,,,,] | This post indicates AI is of good quality | 0.70 (0.61-0.78) | 0.76 (0.67-0.83) |
aAI: artificial intelligence.
LLM Performance Validation
We implemented an LLM-based sentiment analysis and validation pipeline to assess model performance in processing health AI-related text, aiming to improve communication strategies in health care. Using the operational definitions in , researchers manually labeled a benchmark dataset of 150 posts per health AI domain and overall sentiment variable. Benchmark posts were randomly selected from a separate set of health AI-related posts shared outside of the study period. Posts were labeled as “true,” “false,” or “irrelevant” for their stance within each domain (eg, True: Health AI is safe, False: Health AI is not useful, and Irrelevant: Ethics of Health AI is not mentioned) and as “positive,” “negative,” or “neutral” for overall sentiment. In total, 50 posts were labeled per class.
The benchmark size was selected to support comparative evaluation of model performance under realistic annotation constraints, in line with previous evaluations of LLMs for short text annotations [-]. Additional concepts of “importance” and “accessibility” were initially included as dimensions of confidence, but due to scarcity in the X sample of “false” labeled posts (ie, Health AI is not accessible and Health AI is not important), these domains were excluded. This resulted in 1050 total annotations for 6 domains and overall sentiment.
The performance of LLMs GPT-3.5 Turbo (OpenAI) and GPT-4 (OpenAI) was assessed using 4 prompt types—zero-shot, few-shot [], zero-shot chain-of-thought (CoT), and few-shot CoT []. Prompts instructed GPT to classify benchmark posts based on their expressed opinion on the 6 health AI domains and their overall sentiment toward the technology according to the format in .
| Prompt type | Format |
| Zero-shot | Classify tweets as being in agreement (true), disagreement (false), or irrelevant with the belief that AIa is [adjective]. |
| Few-shot | Tweet: Example Tweet 1 Answer: True/False/Irrelevant Tweet: Example Tweet 2 Answer: True/False/Irrelevant Tweet: Example Tweet 3 Answer: True/False/Irrelevant Instruction: Classify tweets as being in agreement (true), disagreement (false), or irrelevant with the belief that AI is [adjective]. |
| Zero-shot CoTb | Classify tweets as being in agreement (true), disagreement (false), or irrelevant with the belief that AI is [adjective]. Let\'s go step by step. |
| Few-shot CoT | Tweet: Example Tweet 1 Thought: Example Explanation 1 Answer: True/False/Irrelevant Tweet: Example Tweet 2 Thought: Example Explanation 2 Answer: True/False/Irrelevant Tweet: Example Tweet 3 Thought: Example Explanation 3 Answer: True/False/Irrelevant Instruction: Classify tweets as being in agreement (true), disagreement (false), or irrelevant with the belief that AI is [adjective]. |
aAI: artificial intelligence.
bCoT: chain of thought.
This created a dataset of GPT-generated labels for all benchmark posts.
We defined the performance metric as weighted F1-score (wF1-score), calculated for the classification of each domain as in the equation below, where a, b, and c refer to the number of posts classified to each category by human annotations, and True, False, Irrelevant, refer to the F1-scores for each classification):
This compared the predicted (LLM) labels of “true,” “false,” and “irrelevant” for each domain to the true (researcher) labels for the same posts. Model performance was calculated using nonparametric bootstrapping of 1000 resamples with replacement from each domain’s n=150 annotated items, resulting in domain-specific mean wF1-score and 95% CIs []. Prompt ablation is reported in Figure S3 and Table S3 of .
We prespecified a threshold of wF1-score>0.6 as the minimum acceptable agreement with human annotations. This threshold was selected based on the historic performance of sentiment models and emergent applications of LLMs, like GPT-3.5-Turbo for annotation [,,], along with the inherent subjectivity of the described task.
GPT-3.5 Turbo was chosen for cost efficiency to label the full dataset (). Prompts with wF1-scores>0.6 were passed to the model for labelling of the full dataset (). Verbatim prompt variations, along with model parameters (eg, version, temperature, and random seed), are specified in .
Data Curation
Data for this study were sampled from X, selected because of its status as a major site of health communications for health-related institutions, academia, companies, and interested individuals [], with over 550 million monthly users in 2023 [].
X posts from January 1, 2023 to December 5, 2023 – an 11-month period when ChatGPT site visits grew by 1 billion [], were collected using AI-related keywords. Keywords included popular LLMs like “ChatGPT,” techniques such as “natural language processing,” and AI domains like “conversational AI,” along with the term “health.” For the full Boolean search string, refer to Figure S1 in . Tweets in English were included. As LLM performance for sentiment classification tasks is variable for low- and mid-resource languages [], consistency through English texts was opted for to avoid introducing variability in processing and interpretation of model annotations. In addition, a previous analysis of ChatGPT-related tweets from an overlapping time period found 72% were in English []. In total, 786,750 posts were selected, including original posts, retweets, and quoted tweets – posts responding to an original post.
Qualitative review of the sample identified posts shared from general users, news sources, academic institutions, health organizations, influencers – especially tech leaders, and private companies. Past estimates suggest between 9% to 15% of active X users have automated accounts []. This reflects the reality of social media as a site of information exchange between individuals, organizations, and even bots.
Data Cleaning
For sentiment analysis, duplicates were removed from the raw dataset, leaving 394,533 posts.
For emotion analysis and topic modeling, we filtered out posts that only contained hashtagged text or contained 4 or more stock tickers (eg, $XXX). The following preprocessing steps were then applied: emoticons, usernames (starting with “@”), URLs, special characters, “retweet,” “QT” (quote tweet), and stock ticker patterns were removed. Multiple spaces, tabs, and newlines were collapsed into a single space. Duplicates were dropped again, leaving 268,347 posts (Figure S2 in ).
For topic modeling, the text was converted to lowercase. Tokenization was performed with spaCy (English model en_core_web_lg). We retained only alphabetic tokens, applied lemmatization and lowercasing, and removed tokens that were stopwords from the spaCy default list or had part of speech tags ADV, PRON, CCONJ, PUNCT, PART, DET, ADP, SPACE, NUM, or SYM. Common tokens were standardized, for example, “ml” to “machine learning.”
Emotion Analysis
Compared with research on perspectives toward AI, the role of emotions in the expression and formation of these attitudes is understudied. Previous work has found that individuals with strong anger and less hope show more support for AI upon exposure to information sources []. To explore the emotional content of discussions, we used “Emollama-7b”—an open source LLM finetuned from LLaMA2-7B for affective analysis.
Emotion analysis used the model’s 11 recommended emotional classifications. This includes 6 emotions previously studied for their implication in decision-making, including judgments of science []. A previous benchmark found that Emollama-7b outperforms ChatGPT and GPT-4 for the emotion classification task: “Given a tweet, classify it as ‘neutral or no emotion’ or as one, or more, of eleven given emotions (anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust) that best represent the mental state of the tweeter” [,]. This identical prompt was used to instruct Emollama-7b to label n=268,347 X posts. Default parameters were used based on the model downloaded from Hugging Face []. We clarify that “trust” in the emotion analysis refers to an affective state label from the emotion model and is conceptually distinct from attribute‑level trustworthiness of health AI measured in the sentiment analysis.
Topic Modeling
To characterize the thematic dimensions of health AI discussed online, we applied latent Dirichlet allocation (LDA), an unsupervised probabilistic topic-modeling approach in which documents (X posts) are represented as mixtures of latent topics, and topics as distributions over words []. LDA has been widely used to summarize large-scale health-related social media corpora [-].
We implemented LDA using the Gensim LDAMulticore function in Python. Tokens occurring more than 10 times across the corpus and in fewer than 50% of posts were retained in the dictionary to reduce sparsity and remove highly ubiquitous terms. Candidate models ranging from k=3 to k=12 topics were evaluated using the Cv coherence metric. For each model, the training procedure used 10 passes over the corpus and 50 iterations. Hyperparameters (alpha and beta) were retained at default values.
Because LDA inference is stochastic, we assessed topic stability by refitting the k=10 model across 4 random initialization seeds and comparing the consistency of high-probability keywords and overall thematic structure. Although some variation in keyword ordering occurred across runs, the main thematic groupings were consistent. The final model (k=10) was selected based on coherence trends across candidate models and qualitative interpretability, balancing thematic granularity with semantic distinctiveness. Furthermore, 3 researchers (LMW, ZW, and LL) then iteratively refined topic labels by synthesizing the top 10 highest-probability words per topic and validating each label against random samples of posts []. Topics were subsequently summarized by WHO region and emotional distribution.
In-Depth Analysis of Global Public Confidence in Health AI
GPT-3.5 Turbo labeled sentiments and health AI confidence in X posts. After cleaning, 394,533 posts were submitted to GPT-3.5 Turbo for sentiment analysis. Due to OpenAI’s policy prohibiting discussions of violence, illegal activity, and adult content [], several thousand explicit posts were excluded, resulting in a labeled dataset of 388,009. Posts without a GPT label were excluded from the analysis of each confidence domain.
GPT-labeled posts were grouped by WHO region based on user location, with excluded areas (eg, Hong Kong and Vatican City) added as needed (Table S1 in ). The top 5 countries with the most posts in each region are listed in Table S2 in . Posts annotated as “positive,” “neutral,” and “negative” in sentiment were mapped to 1, 0, and –1, respectively. Averages were calculated as arithmetic means by WHO region, dividing the number of posts in each classification by the total number of posts in each region. 95% CIs are reported using a nonparametric bootstrap with n=1000 resamples. Reach was analyzed as the number of unique users who saw a post. This metric is provided from Meltwater’s (Meltwater US Holdings Inc) application program interface [], which was used to extract posts from X based on our search criteria. Analyses and visualizations were conducted in Python (v3.11.8).
Ethical Considerations
This study analyzed publicly available posts from X (formerly Twitter) and does not constitute human subjects research, as no private or identifiable information was collected and no interaction with individuals occurred. Accordingly, formal ethical review and informed consent were not required. All data were accessed through an authorized third-party data provider and handled in accordance with X’s terms of service. No personally identifiable information was reported; all analyses were conducted at the aggregate level, and no individual users are identified or identifiable in the manuscript or supplementary materials. As this study did not involve human participants, compensation was not applicable.
Results
Sentiment Analysis
Frequency and overall sentiment of posts by WHO region are summarized in . In total, 42 countries shared over 500 posts in the study period. Geographically, most posts came from the Region of the Americas (AMR), accounting for 121,201 (31.38%) posts, followed by the European Union with 57,234 (14.82%) posts. However, a large portion of posts (n=153,289, 39.68%) came from users who did not disclose their location on X.
| Overall (n=386,287) | AFRa (n=10,624) | AMRb (n=121,201) | EMRc (n=6375) | European Union (n=57,234) | SEARd (n=24,888) | WPRe (n=12,676) | Undisclosed (n=153,289) | |||||||
| Positive | ||||||||||||||
| n | 252,082 | 7179 | 79,284 | 4736 | 37,848 | 16,974 | 7608 | 98,453 | ||||||
| Sample proportionf,% (95% CI) | 65.26 (65.11-65.4) | 67.57 (66.63-68.48) | 65.42 (65.16-65.67) | 74.29 (73.13-75.34) | 66.13 (65.76-66.5) | 68.2 (67.61-68.74) | 60.02 (59.2-60.85) | 64.23 (65.11-65.4) | ||||||
| Negative | ||||||||||||||
| n | 36,505 | 721 | 11,565 | 227 | 4639 | 930 | 1217 | 17,206 | ||||||
| Sample proportion, % (95% CI) | 9.45 (9.36-9.54) | 6.79 (6.31-7.26) | 9.54 (9.37-9.71) | 3.56 (3.11-4) | 8.11 (7.87-8.32) | 3.74 (3.51-3.98) | 9.60 (9.1-10.07) | 11.22 (11.07-11.38) | ||||||
| Neutral | ||||||||||||||
| n | 97,700 | 2724 | 30,352 | 1412 | 14,747 | 6984 | 3851 | 37,630 | ||||||
| Sample proportion, % (95% CI) | 25.29 (25.15-25.43) | 25.64 (24.83-26.53) | 25.04 (24.79-25.29) | 22.15 (21.08-23.12) | 25.77 (25.4-26.11) | 28.06 (27.47-28.59) | 30.38 (29.55-31.18) | 24.55 (24.34-24.77) | ||||||
| Mean sentiment score (SD; 95% CI) | 0.5561 (0.6600; 0.5581-0.5602) | 0.6079 (0.6117; 0.5960-0.6194) | 0.5587 (0.6614; 0.5551-0.5625) | 0.7073 (0.5275; 0.6947-0.7194) | 0.5802 (0.6369; 0.5749-0.5856) | 0.6446 (0.5512; 0.6377-0.6515) | 0.5042 (0.6648; 0.4924-0.5151) | 0.5300 (0.5267-0.5335) | ||||||
aAFR: African Region.
bAMR: Region of the Americas.
cEMR: Eastern-Mediterranean Region.
dSEAR: South-East Asia Region.
eWPR: Western Pacific Region.
fSample proportion: Number of posts/total.
The majority of posts were positive (n=252,082, 65.26%), followed by neutral (n=97,700, 25.29%), and negative (n=36,505, 9.45%) about health AI. The sentiment distribution of WHO regions revealed significant differences (χ2=2274.1, P<.001). The Eastern Mediterranean Region (EMR) was most positive about health AI, with nearly three-quarters of posts expressing positive sentiments. While the Western Pacific Region (WPR) had the lowest sentiment score of 0.50, driven by a higher percentage of neutral sentiment posts (3851/12,676, 30.38%), posts from an undisclosed region shared the highest proportion of negative comments about health AI, at 11.22% (37,630/153,289).
Confidence in Health AI
The majority of posts expressed attitudes in favor of health AI. “Privacy” showed the biggest divide in opinions for or against the technology, with 45,977/68,943 (66.69%; 95% CI 66.34%-67.02%) of privacy-related posts affirming health AI’s respect for privacy, while 22,966/69,943 (33.31%; 95% CI 32.98%-33.66%) of related posts were classified as indicating health AI threatened privacy. This concern was highest in AMR and undisclosed regions, with averages in support of privacy falling significantly below the overall sample proportion. In AMR, 14,803/22,660 (65.33%; 95% CI 64.7%-65.96%) of related posts supported health AI’s respect for privacy, while 17,285/26,828 (64.43%; 95% CI 63.88%-64.99%) of posts expressed the same belief in undisclosed regions (). Trustworthiness was the next most divided topic, but 136,700/168,041 (81.35%; 95% CI 81.16%-81.53%) of trust-related posts still voiced trust toward health AI. By comparison, the usefulness of health AI was nearly unanimous. There were 291,457 in favor of usefulness out of 315,528 related posts, totaling 92.37% (95% CI 92.28%-92.47%) of posts in the overall sample in support of health AI’s usefulness. Support within EMR was significantly higher at 5321/5479 (97.12%; 95% CI 96.7%-97.57%) of relevant posts.

EMR reported the highest percentage of posts in favor of health AI across all 6 domains. For “usefulness,” “quality,” “safety,” and “trust,” WPR had the lowest percentage of posts in favor, but rose above the overall sample proportion for “privacy.” AMR consistently voiced lower support for health AI concepts than the overall sample. Full results of the sentiment analysis are reported in Table S5 in .
Emotion Analysis and Topic Modeling
Considering the 11 possible emotions explored in discussions of health AI, out of the total sample, 224,386 (83.62%; 95% CI 83.48%-83.76%) posts contained “optimism.” This was followed by “anticipation,” found in 165,300 (61.6%; 95% CI 61.42%-61.77%) posts. “Optimism” was nearly 10 percentage points higher than the overall sample in the South-East Asia Region (SEAR), found in 15,860 posts (91.79%; 95% CI 91.41%-92.2%) and EMR in 4279 (91.63%; 95% CI 90.79%-92.4%) posts. Moreover, 7563 (80.84%; 95% CI 80.07%-81.65%) posts from WPR contained optimism, which was the lowest regional proportion. WPR expressed the highest proportion of all negative emotions: pessimism, disgust, sadness, anger, and fear, at a rate consistently above the overall sample. This included 3021 (32.29%; 95% CI 31.34%-33.23%) posts from the region that contained pessimism. Full emotional results are available in Table S7 in . Posts with “anticipation” reached the most accounts on average (8604.54, 95% CI 8099.95-9136.3), followed by trust (8596.68, 95% CI 7396.50-10,074.38), and fear (7721.15, 95% CI 6455.61-9146.52).
The 10 topics generated by the LDA topic model reflected the diversity of considerations when it comes to health AI, spanning commercial excitement from product developments to ethical considerations of responsible AI development. From the top 5 most popular topics (), the first was “tech industry,” characterized by terms like “business,” “digital,” and “innovation.” These tweets focused on the commercial potential of the digital health revolution, including direct marketing from industry stakeholders. This topic was most common within EMR, found in 3594 (26.55%; 95% CI 25.33%-27.74%) local posts, followed by 11,429 (23.29%; 95% CI 22.65%-23.91%) within SEAR, and 51,956 (17.43%; 95% CI 17.35%-17.8%) within undisclosed regions. The second most popular topic was “algorithms and data,” which frequently used the terms “datum,” “model,” and “patient.” Review of sample posts found discussions of the use of personal health data for health prediction and disease models, and it was most prominent in AMR, which shared 47,910 (18.19%; 95% CI 17.92%-18.44%) related posts of discussions within the region. This was followed by “human-AI alignment,” addressing AI’s alignment with human ethics and values. Notably, 40.25% (14,966/37,184) of these discussions mentioned “mental health,” along with terms like “help” and “people.” “Patient care applications” examined specific AI-driven health solutions and products, such as the creation of drug treatments and diagnostic abilities, referencing words like “treatment,” “improve,” “patient,” and “care.” Finally, “research forums” highlighted emerging research opportunities like conferences and academic positions using words such as “join,” “digital,” “global,” and “work.” Interest in research was predominant within WPR, which shared n=6950 (19.27%; 95% CI 18.5%-20.07%) related posts from the region, European Union at n=19,192 (18.39%; 95% CI 18.01%-18.8%) posts, and the African Region with n=3657 (16.75%; 95% CI 15.92%-17.6%) posts ().

| Topic name | n (%; 95% CI) | Probable terms | Salient terms | Examples |
| Tech industry | 42,399 (15.88; 15.67-15.94) | business, ChatGPT, public, digital, ethical, social, technology, education, system, and need | industry, future, technology, revolutionize, innovation, machine learning, finance, transform, potential, and blockchain |
|
| Algorithms and data | 37,700 (14.05; 13.92-14.19) | datum, model, patient, medical, care, generative, system, new, clinical, and machine learning | patient, datum, medical, improve, treatment, model, care, disease, outcome, and diagnosis |
|
| Human-AIa alignment | 37,184 (13.86; 13.72-13.99) | mental, people, need, human, help, ChatGPT, support, work, good, and time | mental, people, need, talk, help, ChatGPT, good, therapy, diaspora, and feel |
|
| Patient care applications | 33,850 (12.6; 12.48-12.73) | patient, care, medical, treatment, improve, disease, diagnosis, help, technology, and doctor | digitalhealth, doctor, healthtech, chatgpt, patient, care, healthit, news, telemedicine, and medicine |
|
| Research forums | 33,198 (12.37; 12.25-12.49) | research, join, global, digital, work, science, innovation, discuss, project, and future | join, research, discuss, global, register, event, conference, science, university, and discussion |
|
aAI: artificial intelligence.
For all 10 topics, we report the top terms with topic word probabilities and example posts in Table S7 in to support interpretability and reproducibility of labels. The relative frequency of keywords characterizing each of the 10 topics is plotted in , representing key terms contributing to the topic model.

Topics were differently emotionally charged. Each post could be represented by multiple emotions. In “tech industry,” 41,816 (98.31%; 95% CI 98.19%-98.43%) post contained optimism. Similarly, in “research forums,” 32,240 (96.98; 95% CI 96.8%-97.17%) posts contained optimistic language, respectively. Despite also showing high rates of optimism, pessimism was found in 7793 (38.87%; 95% CI 38.23%-39.49%) posts in “societal impacts,” along with 13,874 (36.68%; 95% CI 36.17%-37.2%) in “algorithms and data,” reflecting mixed outlooks on these issues. By comparison, 21,648 posts in “human-AI alignment” and 6948 posts in “AI takeover” contained disgust, making it more prevalent than optimism in both topics (). Anger was also common in these topics; in “human-AI alignment,” 15,019 (40.83%; 95% CI 40.31%-41.3%) posts contained anger, while in “AI takeover,” 5564 (51.27%; 95% CI 50.26%-52.17%) did.
“Love,” “surprise,” and “neutral” (no emotion identified) were in less than 1% of posts and omitted from .

Discussion
Principal Findings
The rapid rise in public attention catalyzed by generative AI—particularly following the release of ChatGPT—coincided with a marked expansion in the scale and visibility of public discourse on health AI in 2023. In this analysis of posts on X using LLM-enabled methods, discussions were predominantly favorable across regions, with perceived usefulness emerging as the strongest driver of confidence. At the same time, privacy was consistently the least trusted dimension, indicating that enthusiasm for the potential of health AI is accompanied by persistent concern about data governance. Emotional patterns reinforced this tension; optimism dominated conversations about innovation, patient care, and research, whereas fear and anger were concentrated in discussions of human-AI alignment and AI takeover. We also identified significant regional variation in these narratives, with EMR and SEAR showing more innovation-oriented and optimistic discussions, while WPR, the European Union, and AMR placed greater emphasis on privacy, ethics, and alignment. Taken together, these findings contribute to the characterization of how health AI is framed and viewed online, showing that public confidence in health AI is not uniform, but multidimensional, regionally shaped, and defined by a balance between promise and concern.
Regional and Thematic Patterns in Health AI Discourse
We found 65% (252,082/386,287) of posts to be positive in sentiment about health AI. This figure is moderately higher but directionally consistent with attitudes toward health AI found on other social media and surveys. A previous study of X and Reddit posts from 2021 to 2024 showed 55% (566/1022) expressed positive sentiments toward AI in medical imaging [], while 58% (7775/13,502) of patients in a 2023 global survey were positive about AI’s integration into health care []. The somewhat higher positivity in our sample may reflect the study time period’s overlap with significant model developments that generated excitement online. Following the release of ChatGPT in late 2022, conversation volume surged from zero related tweets to 550,000 per day by late January 2023, with average sentiment consistently more than 50% positive in the first few months []. Studies including data before and after 2023 may reflect stabilization to a more even distribution between sentiments than we observed. The discussion focus on industry and commercial use cases parallel results from a study of social media from China (Weibo [Sina Corporation]) [], a region largely excluded from this study’s sample on X. This suggests global platforms are commonly used to discuss and promote health AI business, followed by discussions of AI’s impact and alignment with society []. In the media’s portrayal of general AI, business tends to be the most common positive angle [], which we also observed in the near-unanimous level of optimism found in posts discussing the health AI tech industry.
We found enthusiastic attitudes were most prevalent in EMR and SEAR, where positive sentiments and optimistic emotions were significantly higher than in other regions. By comparison, EMR and SEAR rarely mentioned human-AI alignment, which was more popular in WPR and undisclosed regions. The high prevalence of tech industry–focused discussions, at 27% in EMR and 23% in SEAR, aligns with a previous study of news media from the same countries that are primarily represented in the EMR and SEAR samples, the United Arab Emirates, and India, respectively. Nearly 25% of coverage from the United Arab Emirates focused on industry expansion and economic initiatives, while 21% of articles from India discussed startups []. Both countries’ newspapers did not mention regulations, privacy, or ethics []. A potential feedback loop may exist in which supportive media narratives and social media discourse mutually reinforce optimistic narratives about health AI adoption, while simultaneously minimizing discussion of specific concerns more prevalent in other regional media.
On the other hand, social media posts from mostly Western countries in WPR, the European Union, and AMR expressed less confidence in the technology, with greater concern for privacy, ethics, and AI alignment. Sentiment toward health AI was less positive than EMR and SEAR. Emotions of disgust, pessimism, and fear were elevated in posts from the region, mirroring an increasing trend of fear-invoking headlines trend from local newspapers []. In the European Union and WPR, discussions about research, including academic funding, jobs, and conferences, outnumbered mentions of industry. This “responsibility” focus is plausibly explained by regional contexts. In the study period, the European Union was where the first legal framework on AI, the AI Act, was being deliberated [], while AMR’s topical focus on algorithms and data reflects longstanding public concerns over data use and privacy in the United States []. Yet across all regions, privacy was the most consistent concern, substantially present even in EMR and SEAR with otherwise positive sentiment and optimistic emotional tone. This convergence suggests that privacy is a near-universal dimension of public unease about health AI that persists regardless of the broader narrative orientation of a region’s discourse. In this case, social media represents an important middle ground of social expression between traditional media sources, which in some regions contain no mention of AI-related privacy issues [], and direct surveys of patients, which suggest concern about health AI privacy is above 50% []. Where in-depth surveying is limited or not feasible, social media can be analyzed to surface latent concerns that traditional media underreports. The privacy concerns found in this study suggest data governance is a cross-cutting barrier to health AI adoption, even in regions where the broader online narrative appears favorable.
Taken together, these regional patterns demonstrate social media as a real-time, decentralized information environment where competing narratives about AI unfold, distinct from both institutional media and individually held viewpoints. A single global communication strategy for health AI is therefore unlikely to be effective, and instead must address how the proliferation of social media discourse surrounding AI might shape real-world instances of public trust in health technologies. The concentration of high-arousal negative emotions around existential AI narratives is particularly consequential; negative, emotionally charged content is associated with greater spread of posts, including misinformation [,], which could explain why posts using fear to discuss health AI reached close to 8000 users on average. Drawing on previous experiences with vaccine hesitancy, online misinformation is associated with and can affect health technology beliefs and adoption intention [,], including a negative relationship between misinformation consumption and COVID-19 vaccine uptake [,]. Policymakers and health authorities should consider integrating AI-enabled social listening into health technology assessment, risk communication, and regulatory monitoring to proactively anticipate emerging concerns and enable more responsive, publicly aligned health AI governance.
Practical Implications of LLMs
This study evaluated LLM-based annotation for health-related social media analysis using the AI Usage Consideration Checklist (). In terms of accuracy of information, a few-shot CoT approach frequently outperformed previous evaluations of off-the-shelf sentiment tools and zero-shot LLMs, and achieved comparable performance to fine-tuned models on the same metric (wF1-score) [,]. This suggests that few-shot CoT prompting can meaningfully improve annotations for social media analysis, an area where traditional sentiment tools often perform poorly [,]. GPT-4 outperformed GPT-3.5-Turbo across annotation tasks, achieving wF1-scores above 0.7, and continued model development suggests further performance gains are likely.
Regarding usability, the LLM pipeline was accessible to noncomputer science researchers via natural language instructions and simple Python scripts, lowering the barrier to adoption for public health institutions with otherwise limited machine learning capacity. LLMs also enabled annotation of over 300,000 posts, a scale not attained through traditional manual content analysis on the same topic [,].
Limitations and Future Research
This study has several limitations. First, although all annotation tasks achieved wF1-scores above 0.6, performance varied across classification constructs, and complex dimensions, such as trust in AI, remain challenging given the lack of definitional consensus even among human experts []. Our analysis distinguished between perceived trustworthiness and affective trust [] as a pragmatic solution in the absence of an established framework for confidence in health AI; however, this approach is operationalized for scalable social listening and should be interpreted as a descriptive monitoring framework using constructs informed by the literature, rather than a validated psychometric instrument. Misclassification is possible due to LLM tendencies to exaggerate interpretations of positive or negative sentiments [], along with classification challenges when short texts contain sarcasm or irony, which both humans and language models struggle with []. To mitigate these risks, we implemented a structured annotation framework with predefined categories, applied benchmarking thresholds, and reported attribute-level patterns rather than aggregate polarity scores alone. Future work should incorporate formal interannotator validation, larger benchmarks, and model comparisons. Constructs should also be refined and validated against survey-based or experimental measures to strengthen conceptual alignment and elucidate relationships between the online health AI landscape and offline attitudes to such technologies. Third, our analysis was restricted to global English-language posts on X, which may underrepresent non–English-speaking populations and regions where the platform is restricted. However, this focus reflects the empirical context of data collection in 2023, immediately following the release and widespread uptake of predominantly English-language generative AI systems, during which early public discourse was largely English-language dominated []. As AI systems become increasingly multilingual and globally embedded, future studies should expand to non-English datasets and additional platforms to enhance cross-cultural validity and representativeness. Finally, model performance and construct mapping are time- and context-specific, and both public discourse and model behavior may shift over time. Nevertheless, this adaptability is also a strength; computational approaches can be rerun and recalibrated as language and technology evolve. We encourage periodic re-evaluation of model performance and updated benchmarking datasets.
This study advances previous social media analyses of health AI [,] by moving beyond basic sentiment classification to enable multidimensional characterization of public confidence across 42 countries during a period of rapid growth in AI-related online activity. The integration of LLM-based stance and emotion annotation with LDA topic modeling allowed for context-sensitive classification of complex constructs at scale. These findings establish a reproducible framework for scalable surveillance of online attitudes in digital health research that can be extended to other health technologies as they enter public discourse, and recalibrated as the language used to discuss AI continues to evolve. For policymakers, the regional and thematic granularity of discourse data provides detailed analysis about what information online populations are being exposed to and how digital ecosystems respond to changing developments in both technology and regulation. Among health system leaders and communicators, real-time social listening of this kind can anticipate concerns before implementation, including in regions where direct surveying is limited, to understand what aspects of the technology are viewed positively in popular discourse and which still face resistance.
Acknowledgments
The authors declare the use of generative artificial intelligence (GenAI) in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GenAI tools under full human supervision: (1) proofreading and editing, (2) summarizing text, and (3) reformatting.
The GenAI tool used was ChatGPT-4.0.
Responsibility for the final manuscript lies entirely with the authors.
GenAI tools are not listed as authors and do not bear responsibility for the final outcomes.
Funding
This study was supported by the InnoHK initiative of the Innovation and Technology Commission of the Hong Kong Special Administrative Region. The funders had no involvement in the study design, data collection, analysis, interpretation, or the writing of the manuscript.
Data Availability
The social media data analyzed in this study cannot be made publicly available due to licensing and contractual restrictions governing access to the platform data. Researchers seeking to reproduce the findings should obtain independent access to X data through authorized third-party providers or formal research agreements with the platform. The full search strategy and filtering criteria used to construct the dataset are described in to support methodological transparency and reproducibility.
Conflicts of Interest
None declared.
Description of methodology and full results of all analyses.
DOC File , 476 KBGPT prompts and parameter settings.
DOC File , 74 KBAI usage consideration checklist.
DOC File , 271 KBReferences
- Cascella M, Semeraro F, Montomoli J, Bellini V, Piazza O, Bignami E. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. J Med Syst. Feb 17, 2024;48(1):22. [FREE Full text] [CrossRef] [Medline]
- Senthil R, Anand T, Somala CS, Saravanan KM. Bibliometric analysis of artificial intelligence in healthcare research: Trends and future directions. Future Healthc J. Sep 2024;11(3):100182. [FREE Full text] [CrossRef] [Medline]
- Yin J, Ngiam KY, Teo HH. Role of Artificial Intelligence Applications in Real-Life Clinical Practice: Systematic Review. J Med Internet Res. Apr 22, 2021;23(4):e25759. [FREE Full text] [CrossRef] [Medline]
- Marco-Ruiz L, Hernández MÁT, Ngo PD, Makhlysheva A, Svenning TO, Dyb K, et al. A multinational study on artificial intelligence adoption: Clinical implementers' perspectives. Int J Med Inform. Apr 2024;184:105377. [FREE Full text] [CrossRef] [Medline]
- Mohsin Khan M, Shah N, Shaikh N, Thabet A, Alrabayah T, Belkhair S. Towards secure and trusted AI in healthcare: A systematic review of emerging innovations and ethical challenges. Int J Med Inform. Mar 2025;195:105780. [FREE Full text] [CrossRef] [Medline]
- Sahoo RK, Sahoo KC, Negi S, Baliarsingh SK, Panda B, Pati S. Health professionals' perspectives on the use of Artificial Intelligence in healthcare: A systematic review. Patient Educ Couns. May 2025;134:108680. [CrossRef] [Medline]
- Young AT, Amara D, Bhattacharya A, Wei ML. Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digit Health. Sep 2021;3(9):e599-e611. [FREE Full text] [CrossRef] [Medline]
- Beets B, Newman TP, Howell EL, Bao L, Yang S. Surveying Public Perceptions of Artificial Intelligence in Health Care in the United States: Systematic Review. J Med Internet Res. Apr 04, 2023;25:e40337. [FREE Full text] [CrossRef] [Medline]
- Moy S, Irannejad M, Manning SJ, Farahani M, Ahmed Y, Gao E, et al. Patient Perspectives on the Use of Artificial Intelligence in Health Care: A Scoping Review. J Patient Cent Res Rev. Apr 02, 2024;11(1):51-62. [FREE Full text] [CrossRef] [Medline]
- Gao S, He L, Chen Y, Li D, Lai K. Public Perception of Artificial Intelligence in Medical Care: Content Analysis of Social Media. J Med Internet Res. Jul 13, 2020;22(7):e16649. [FREE Full text] [CrossRef] [Medline]
- Fütterer T, Fischer C, Alekseeva A, Chen X, Tate T, Warschauer M, et al. ChatGPT in education: global reactions to AI innovations. Sci Rep. Sep 15, 2023;13(1):15310. [FREE Full text] [CrossRef] [Medline]
- Survey on the impact of online disinformation and hate speech. UNESCO. 2023. URL: https://www.unesco.org/sites/default/files/medias/fichiers/2023/11/unesco_ipsos_survey.pdf [accessed 2026-04-01]
- Milli S, Carroll M, Wang Y, Pandey S, Zhao S, Dragan AD. Engagement, user satisfaction, and the amplification of divisive content on social media. PNAS Nexus. Mar 2025;4(3):pgaf062. [FREE Full text] [CrossRef] [Medline]
- Ferrara E, Yang Z. Quantifying the effect of sentiment on information diffusion in social media. PeerJ Computer Science. Sep 30, 2015;1:e26. [CrossRef]
- Chuai Y, Zhao J. Anger can make fake news viral online. Front. Phys. Aug 22, 2022;10. [CrossRef]
- Han J, Cha M, Lee W. Anger contributes to the spread of COVID-19 misinformation. HKS Misinfo Review. Sep 17, 2020;1(3). [CrossRef]
- Wubineh BZ, Deriba FG, Woldeyohannis MM. Exploring the opportunities and challenges of implementing artificial intelligence in healthcare: A systematic literature review. Urol Oncol. Mar 2024;42(3):48-56. [FREE Full text] [CrossRef] [Medline]
- Martel C, Pennycook G, Rand DG. Reliance on emotion promotes belief in fake news. Cogn Res Princ Implic. Oct 07, 2020;5(1):47. [FREE Full text] [CrossRef] [Medline]
- Gu C, Zhang Y, Zeng L. Exploring the mechanism of sustained consumer trust in AI chatbots after service failures: a perspective based on attribution and CASA theories. Humanit Soc Sci Commun. Oct 22, 2024;11(1):1-12. [CrossRef]
- Novozhilova E, Mays K, Paik S, Katz J. More Capable, Less Benevolent: Trust Perceptions of AI Systems across Societal Contexts. MAKE. Feb 05, 2024;6(1):342-366. [CrossRef]
- Afroogh S, Akbari A, Malone E, Kargar M, Alambeigi H. Trust in AI: progress, challenges, and future directions. Humanit Soc Sci Commun. Nov 18, 2024;11(1):1-30. [CrossRef]
- Almanaa M. Trends and Public Perception of Artificial Intelligence in Medical Imaging: A Social Media Analysis. Cureus. Sep 2024;16(9):e70008. [CrossRef] [Medline]
- Mao Y, Liu Q, Zhang Y. Sentiment analysis methods, applications, and challenges: A systematic literature review. Journal of King Saud University - Computer and Information Sciences. Apr 2024;36(4):102048. [CrossRef]
- He L, Yin T, Zheng K. They May Not Work! An evaluation of eleven sentiment analysis tools on seven social media datasets. J Biomed Inform. Aug 2022;132:104142. [FREE Full text] [CrossRef] [Medline]
- Guido R, Ferrisi S, Lofaro D, Conforti D. An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information. Apr 19, 2024;15(4):235. [CrossRef]
- Krugmann JO, Hartmann J. Sentiment Analysis in the Age of Generative AI. Cust. Need. and Solut. Mar 05, 2024;11(1):3. [CrossRef]
- Zhang W, Deng Y, Liu B, Pan SJ, Bing L. Sentiment Analysis in the Era of Large Language Models: A Reality Check. ArXiv. Preprint posted online on June 14, 2023. [FREE Full text] [CrossRef]
- Wang Z, Pang Y, Lin Y, Zhu X. Adaptable and reliable text classification using large language models. ArXiv. Preprint posted online on December 7, 2024. [FREE Full text] [CrossRef]
- Lossio-Ventura JA, Weger R, Lee AY, Guinee EP, Chung J, Atlas L, et al. A Comparison of ChatGPT and Fine-Tuned Open Pre-Trained Transformers (OPT) Against Widely Used Sentiment Analysis Tools: Sentiment Analysis of COVID-19 Survey Data. JMIR Ment Health. Jan 25, 2024;11:e50150. [FREE Full text] [CrossRef] [Medline]
- He L, Omranian S, McRoy S, Zheng K. Using Large Language Models for sentiment analysis of health-related social media data: empirical evaluation and practical tips. AMIA Annu Symp Proc. 2024;2024:503-512. [FREE Full text] [Medline]
- Albalawi R, Yeap TH, Benyoucef M. Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis. Front Artif Intell. 2020;3:42. [FREE Full text] [CrossRef] [Medline]
- Liu Z, Yang K, Xie Q, Zhang T, Ananiadou S. EmoLLMs: A series of emotional large language models and annotation tools for comprehensive affective analysis. 2024. Presented at: KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2024 August 25 - 29:5487-5496; Barcelona Spain. URL: https://dl.acm.org/doi/10.1145/3637528.3671552 [CrossRef]
- Hou Z, Tong Y, Du F, Lu L, Zhao S, Yu K, et al. Assessing COVID-19 Vaccine Hesitancy, Confidence, and Public Engagement: A Global Social Listening Study. J Med Internet Res. Jun 11, 2021;23(6):e27632. [FREE Full text] [CrossRef] [Medline]
- Zhou X, Zhang X, Larson H, de Figueiredo A, Jit M, Fodeh S, et al. Spatiotemporal trends in COVID-19 vaccine sentiments on a social media platform and correlations with reported vaccine coverage. Bull World Health Organ. Jan 01, 2024;102(1):32-45. [FREE Full text] [CrossRef] [Medline]
- Xu J, Wu Z, Wass L, Larson HJ, Lin L. Mapping global public perspectives on mRNA vaccines and therapeutics. NPJ Vaccines. Nov 14, 2024;9(1):218. [FREE Full text] [CrossRef] [Medline]
- MacDonald NE, SAGE Working Group on Vaccine Hesitancy. Vaccine hesitancy: Definition, scope and determinants. Vaccine. Aug 14, 2015;33(34):4161-4164. [FREE Full text] [CrossRef] [Medline]
- Kelly S, Kaye S, Oviedo-Trespalacios O. What factors contribute to the acceptance of artificial intelligence? A systematic review. Telematics and Informatics. Feb 2023;77:101925. [CrossRef]
- Ali O, Abdelbaki W, Shrestha A, Elbasi E, Alryalat MAA, Dwivedi YK. A systematic literature review of artificial intelligence in the healthcare sector: Benefits, challenges, methodologies, and functionalities. Journal of Innovation & Knowledge. Jan 2023;8(1):100333. [CrossRef]
- Hassan N, Slight R, Bimpong K, Bates DW, Weiand D, Vellinga A, et al. Systematic review to understand users perspectives on AI-enabled decision aids to inform shared decision making. NPJ Digit Med. Nov 21, 2024;7(1):332. [FREE Full text] [CrossRef] [Medline]
- Wu C, Xu H, Bai D, Chen X, Gao J, Jiang X. Public perceptions on the application of artificial intelligence in healthcare: a qualitative meta-synthesis. BMJ Open. Jan 04, 2023;13(1):e066322. [FREE Full text] [CrossRef] [Medline]
- Ibrahim F, Münscher J-C, Daseking M, Telle N. The technology acceptance model and adopter type analysis in the context of artificial intelligence. Front Artif Intell. Jan 16, 2025;7:1496518. [FREE Full text] [CrossRef] [Medline]
- Alizadeh M, Kubli M, Samei Z, Dehghani S, Zahedivafa M, Bermeo JD, et al. Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning. J Comput Soc Sci. Dec 18, 2024;8(1):17. [CrossRef] [Medline]
- Espinosa L, Salathé M. Use of large language models as a scalable approach to understanding public health discourse. PLOS Digit Health. Oct 2024;3(10):e0000631. [CrossRef] [Medline]
- Deiner MS, Deiner NA, Hristidis V, McLeod SD, Doan T, Lietman TM, et al. Use of Large Language Models to Assess the Likelihood of Epidemics From the Content of Tweets: Infodemiology Study. J Med Internet Res. Mar 01, 2024;26:e49139. [FREE Full text] [CrossRef] [Medline]
- Brown TB, Mann B, Ryder N. Language Models are Few-Shot Learners. ArXiv. Preprint posted online on July 22, 2020. [FREE Full text] [CrossRef]
- Wei J, Wang X, Schuurmans D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. ArXiv. Preprint posted online on January 10, 2023. [FREE Full text] [CrossRef]
- Bojić L, Zagovora O, Zelenkauskaite A, Vuković V, Čabarkapa M, Veseljević Jerković S, et al. Comparing large Language models and human annotators in latent content analysis of sentiment, political leaning, emotional intensity and sarcasm. Sci Rep. Apr 03, 2025;15(1):11477. [FREE Full text] [CrossRef] [Medline]
- Zhu Y, Zhang P, Haq E, Hui P, Tyson G. Can ChatGPT reproduce human-generated labels? A study of social computing tasks. ArXiv. Preprint posted online on April 22, 2023. [FREE Full text] [CrossRef]
- Ola O, Sedig K. Understanding Discussions of Health Issues on Twitter: A Visual Analytic Study. Online J Public Health Inform. May 16, 2020;12(1):e2. [FREE Full text] [CrossRef] [Medline]
- X (Formerly Twitter) User Age, Gender, & Demographic Stats. 2024. URL: https://explodingtopics.com/blog/x-user-stats [accessed 2025-02-03]
- Number of ChatGPT users. Exploding Topics. URL: https://explodingtopics.com/blog/chatgpt-users [accessed 2024-08-16]
- Lai VD, Ngo N, Pouran BVA. ChatGPT beyond english: Towards a comprehensive evaluation of large language models in multilingual learning. Association for Computational Linguistics; 2023. Presented at: Findings of the Association for Computational Linguistics: EMNLP 2023; 2026 April 01:13171-13189; Singapore. [CrossRef]
- Varol O, Ferrara E, Davis C, Menczer F, Flammini A. Online human-bot interactions: Detection, estimation, and characterization. In: ICWSM. Online Human-Bot Interactions. Detection, Estimation, and Characterization. Proceedings of the International AAAI Conference on Web and Social Media AAAI Press; 2017. Presented at: International AAAI Conference on Web and Social Media; 2017 May 15-18:280-289; Montreal, Canada. URL: https://cdn.aaai.org/ojs/14871/14871-28-18390-1-2-20201228.pdf [CrossRef]
- Choi S, Lee C, Park A, Lee JA. How the Public Makes Sense of Artificial Intelligence: The Interplay Between Communication and Discrete Emotions. Science Communication. Nov 23, 2024;47(4):553-584. [CrossRef]
- Drummond C, Fischhoff B. Emotion and judgments of scientific research. Public Underst Sci. Apr 2020;29(3):319-334. [CrossRef] [Medline]
- Buechel S, Hahn U. EmoBank: Studying the impact of annotation perspective and representation format on dimensional emotion analysis. Association for Computational Linguistics; 1996. Presented at: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers; 2026 April 01:578-585; Valencia, Spain. [CrossRef]
- lzw1008/Emollama-7b. Hugging Face. URL: https://huggingface.co/lzw1008/Emollama-7b [accessed 2026-02-03]
- (PDF) Latent Dirichlet Allocation. URL: https://www.researchgate.net/publication/221620547_Latent_Dirichlet_Allocation [accessed 2024-08-16]
- Khan A, Ali R. Measuring the effectiveness of LDA-based clustering for social media data. IEEE; 2023. Presented at: 2023 International Conference on Advances in Intelligent Computing and Applications (AICAPS); 2023 February 1–3:1-8; Kochi, India. [CrossRef]
- Molenaar A, Lukose D, Brennan L, Jenkins EL, McCaffrey TA. Using Natural Language Processing to Explore Social Media Opinions on Food Security: Sentiment Analysis and Topic Modeling Study. J Med Internet Res. Mar 21, 2024;26:e47826. [FREE Full text] [CrossRef] [Medline]
- Huangfu L, Mo Y, Zhang P, Zeng DD, He S. COVID-19 Vaccine Tweets After Vaccine Rollout: Sentiment-Based Topic Modeling. J Med Internet Res. Feb 08, 2022;24(2):e31726. [FREE Full text] [CrossRef] [Medline]
- Leung YT, Khalvati F. Exploring COVID-19-Related Stressors: Topic Modeling Study. J Med Internet Res. Jul 13, 2022;24(7):e37142. [FREE Full text] [CrossRef] [Medline]
- Weston SJ, Shryock I, Light R, Fisher PA. Selecting the Number and Labels of Topics in Topic Modeling: A Tutorial. Advances in Methods and Practices in Psychological Science. May 25, 2023;6(2):251524592311601. [CrossRef]
- Usage Policies. URL: https://openai.com/policies/usage-policies/ [accessed 2024-08-16]
- Meltwater. URL: https://www.meltwater.com/en [accessed 2024-08-16]
- Busch F, Hoffmann L, Xu L, Zhang LJ, Hu B, García-Juárez I, COMFORT consortium, et al. Multinational Attitudes Toward AI in Health Care and Diagnostics Among Hospital Patients. JAMA Netw Open. Jun 02, 2025;8(6):e2514452. [FREE Full text] [CrossRef] [Medline]
- Ryazanov I, Öhman C, Björklund J. How ChatGPT Changed the Media’s Narratives on AI: A Semi-automated Narrative Analysis Through Frame Semantics. Minds & Machines. Nov 23, 2024;35(1):2. [CrossRef]
- Ittefaq M, Zain A, Arif R, Ala-Uddin M, Ahmad T, Iqbal A. Global news media coverage of artificial intelligence (AI): A comparative analysis of frames, sentiments, and trends across 12 countries. Telematics and Informatics. Jan 2025;96:102223. [CrossRef]
- Gilbert S. The EU passes the AI Act and its implications for digital medicine are unclear. NPJ Digit Med. May 22, 2024;7(1):135. [FREE Full text] [CrossRef] [Medline]
- Osnat B. Patient perspectives on artificial intelligence in healthcare: A global scoping review of benefits, ethical concerns, and implementation strategies. Int J Med Inform. Nov 2025;203:106007. [CrossRef] [Medline]
- McLoughlin KL, Brady WJ, Goolsbee A, Kaiser B, Klonick K, Crockett MJ. Misinformation exploits outrage to spread online. Science. Nov 29, 2024;386(6725):991-996. [CrossRef] [Medline]
- Loomba S, de Figueiredo A, Piatek SJ, de Graaf K, Larson HJ. Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nat Hum Behav. Mar 2021;5(3):337-348. [CrossRef] [Medline]
- Allen J, Watts DJ, Rand DG. Quantifying the impact of misinformation and vaccine-skeptical content on Facebook. Science. May 31, 2024;384(6699):eadk3451. [CrossRef] [Medline]
- Pierri F, Perry BL, DeVerna MR, Yang K, Flammini A, Menczer F, et al. Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal. Sci Rep. Apr 26, 2022;12(1):5966. [FREE Full text] [CrossRef] [Medline]
- Romer D, Winneg KM, Jamieson PE, Brensinger C, Jamieson KH. Misinformation about vaccine safety and uptake of COVID-19 vaccines among adults and 5-11-year-olds in the United States. Vaccine. Oct 26, 2022;40(45):6463-6470. [FREE Full text] [CrossRef] [Medline]
- Gille F, Jobin A, Ienca M. What we talk about when we talk about trust: Theory of trust for AI in healthcare. Intelligence-Based Medicine. Nov 2020;1-2:100001. [CrossRef]
- Bormann I, Bempreiksz‐Luthardt J, Niedlich S. Exploring the Interplay of Cognition and Emotion in Trust Relationships With Cognitive‐Affective Maps. Personal Relationships. Jan 23, 2025;32(1):e12587. [CrossRef]
Abbreviations
| AI: artificial intelligence |
| AMR: Region of the Americas |
| CoT: Chain of Thought |
| EMR: Eastern-Mediterranean Region |
| LDA: latent Dirichlet allocation |
| LLM: large language model |
| NLP: natural language processing |
| SEAR: South-East Asia Region |
| wF1-score: weighted F1-score |
| WHO: World Health Organization |
| WPR: Western Pacific Region |
Edited by S Brini; submitted 27.Jul.2025; peer-reviewed by X Lin, M Vochozka; comments to author 01.Oct.2025; accepted 24.Mar.2026; published 05.May.2026.
Copyright©Lily Minh Wass, Zhengdong Wu, José Vizoso, Joseph T Wu, Leesa Lin. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 05.May.2026.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.


